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INTRODUCTION 


We  describes  experiments  intended  to  exploit  the  potential  of  modern 
microcomputers  for  harmonic  analysis,  Der  findings  are  a  contribution  to 
the  discussion  of  how  far  modern  microcomputers  can  complement,  compete 
with,  and,  in  certain  circumstances,  substitute  for  the  mainframe. 

Harmonic  analysis  is  fundamental  to  signal  processing  which,  in  turn,  has 
many  applications  both  in  civilian  and  military  contexts.  The  publication 
of  so-called  Fast  Fourier  Transform^FFT)  algorithms  revolutionised  digital 
analysis:  results  which  had  previously  required  many  hours  of  computation 
could  be  obtained  in  minutes.  Microcomputers  can  not  yet  compete  with 
mainframes  in  terms  of  speed  but  they  do  have  the  important  advantages  of 
portability  and  lower  cost.  }5t*e  experience  shows  that  equipment  exists 
which  combines  the  advantages  of  speed  and  an  accuracy  adequate  for 
contaminated  data,  with  portability,  availability,  and  versatility  that  are 
characteristic  of  the  microcomputer. 

^Our  attention  is  confined  to  the  Fast  Fourier  Transform  about  which  so  much 
has  been  written,  We  define  for  data  length  t,  the  discrete  Fourier 
transform,  by 

1  t'1 

A  =  -  l  a  exp(-2nrs/t),  (s=0 ,1 ,2 . t-1).  (Eq.  1) 

s  t  r=0  r 


The  "time  domain"  function,  ap,  can  be  recovered  from  its  "frequency 

domain"  counterpart,  A  by  the  formula 
5  » 


t-i 

a.  =  l  A  exp(2nrs/t),  (r=0,l,2 . t-1).  (Eq.  2) 

s=0  s 


Although  t  is  equal  to  a  power  of  2  in  the  experiments  the  conclusions  are 
almost  entirely  independent  of  this  condition. 
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1  TEST  FUNCTIONS 


For  experimental  purposes  it  is  convenient  to  use  test  functions,  that  is, 
functions  ar  with  a  known,  discrete  Fourier  transform  A  .  Such  functions 
are  given  in  [11.  The  functions,  TF1,  TF2,  and  TF3,  \iere  used  and  are 
described  below. 


ar 

\ 

exp  ( 2»  r2 1  /t ) 

(l+1)(l+i"2s't)exp  (-2*s2i/4t)/(2(t)1/2 

This  pair  generalises  the  so-called  Gauss  Sum,  known  In  Number  Theory  and 
signal  processing,  and  applies  to  the  “chirp"  transform.  Both  ar  and  As 
are  complex. 


ar 

As 

(TF2) 

r/t 

(1-1/t )/2  (s-0) 

(»l+iCOt(irS/t))/(2t)  ( 0<S< t -1 ) . 

This  function  is 

real  in  the  time 

domain  and  its  transform  is  complex. 

ar 

As 

( TF3 ) 

r( 1-r/t  )/t ) 

(1-1/t2 )/6  (s-0) 

-l/(2t2sin2*s/t)  (0<sxt-l). 

This  function  and  its  transform  are  both  real.  In  some  cases  a  sinusoid 
was  used  as  Input. 

Test  functions  have  certain  benefits.  The  Input  function,  ar,  can  be 
calculated  to  the  full  accuracy  given  by  the  computer;  As  can  be  obtained 
numerically  by  use  of  the  algorithm.  Then  As  can  be  compared  in  various 
ways  with  the  values  calculated  to  full  machine  accuracy  from  the  Inversion 
formula.  This  supplements  the  commonly  used  procedure  of  getting 
As  numerically  by  the  algorithm,  inverting  numerically,  and  comparing  the 
result  with  ar. 

The  choice  of  test  functions  was  made  to  permit  experimentation  with 
versions  of  the  algorithms  that  were  designed  for  full  complex  input  and 
output  as  well  as  modifications  of  the  algorithms  for  real  time  domain  data 
and  real  and  complex  results  In  the  frequency  domain.  The  modifications 
usually  offer  substantial  savings  In  time. 
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2  COMPUTERS 

The  two  types  of  personal  computers  that  were  used  in  the  experiments  were 
suitable  for  scientific  work.  They  included  the  Apple  (Series  II)  with  a 
Synertek  6502  processor  and  an  8-bit  word  and  a  portable  IBM  PC,  with  an 
Intel  8088  microprocessor  and  8087  coprocessor  with  a  16-bit  word;  this 
latter  system  is  a  powerful  combination.  A  Microsoft  BASIC  compiler  and 
the  standard  Apple  PASCAL  system  were  available  for  the  Apple.  The  IBM  PC 
used  the  Borland  2.0  Turbo-87  PASCAL  compiler  and  the  Microsoft  3.20 
FORTRAN  compiler;  both  use  the  8087  coprocessor.  The  Apple  language  system 
also  offers  FORTRAN  under  the  PASCAL  operating  system  but  was  not  used. 
Nevertheless,  comparisons  made  with  PASCAL  programs  give  a  sufficient 
indication  of  the  relative  speed  of  the  two  PC  systems. 


3  ALGORITHMS 

The  FFT  algorithms  used  in  the  experiments  are  all  based  on 
Cooley-Tukey  [2],  or  Sande-Tukey  [3],  sometimes  with  modifications  (see 
Singleton  [4])  for  recursive  calculations  of  sines  and  cosines.  Because  t 
is  a  power  of  2  the  basis  of  the  algorithms  is  the  expression  of  r  and  s  in 
the  scales  of  2  or  4,  combined  if  necessary.  Because  of  a  substantial 
variation  in  speed  between  the  algorithms,  most  of  the  experiments  were 
limited  to  the  fastest  performers.  Table  1  lists  the  algorithms  under  the 
names  we  have  used,  the  original  language,  and  the  reference. 


TABLE  1 

ALGORITHMS  AND  THEIR  SOURCE 


Name 

Original  Languaqe 

Reference 

MCGW/Singleton 

ALGOL 

[5],  [11] 

FFT /10 

BASIC 

[6] 

BRENNER  modified 

FORTRAN 

[7] 

MONRO  4/5 

FORTRAN 

[8] 

MONRO  modified 

FORTRAN 

[9] 

FOURT 

FORTRAN 

[10] 

The  MONRO  algorithms  were  translated  into  both  BASIC  and  PASCAL.  FOURT  was 
translated  into  BASIC. 
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4  PRELIMINARY  TIMINGS 

Table  2  gives  timings  to  the  nearest  second  for  the  Apple  II.  Test 
Function  1  is  used.  The  programs  are  MCGW,  FFT/10,  M0NR04,  M0NR05  and 
FOURT.  M0NR04  is  Algorithm  AS83  taken  from  [8],  M0NR05  is  a  version  of 
M0NR04  that  uses  do-loops  for  unscrambling  as  suggested  in  [3].  O.M.  MONRO 
claimed  this  version  is  faster,  but  application  of  M0NR05  to  the  Apple  II 
presented  some  difficulties ;  in  the  end,  the  speed  gain,  if  any,  was 
insignificant. 


TABLE  2 

APPLE  II  TIMINGS  TEST  FUNCTION  1 


Timings  (s) 


bbm 

MCGW 

FFT/10 

M0NR04 

M0NR05 

FOURT 

PASCAL 

Compiled 

BASIC 

Compiled 

BASIC 

PASCAL 

— 

Compiled 

BASIC 

PASCAL 

Compiled 

BASIC 

32 

10 

4 

1 

m 

1 

1 

64 

25 

11 

3 

3 

3 

128 

62 

24 

8 

8 

14 

7 

256 

146 

56 

17 

31 

16 

30 

14 

512 

341 

128 

38 

69 

37 

69 

32 

1024 

783 

284 

80 

148 

79 

144 

70 

Because  speed  Is  our  main  interest  we  shall  say  no  more  about  MCGW  and 
FFT/10.  Thus  the  MONRO  series  and  FOURT  remain  the  only  serious  contenders 
in  terms  of  speed.  Although  both  the  MONRO  and  FOURT  methodologies  seem 
identical,  FOURT  has  the  speed  advantage  because  of  some  factor  not  yet 
identified. 

Table  3  gives  similar  timings  for  runs  in  the  MONRO  series,  BRENNER 
modified,  and  MCGW  using  the  IBM  PC,  all  with  TF1  (Eq.  1).  The  compilers 
provided  options  to  Improve  performance,  as  noted,  for  which  a  penalty  in 
compiling  time  has  to  be  paid.  The  times  relate  exclusively  to  1024  point 
transforms. 
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TABLE  3 

IBM  PC  +  8087  COPROCESSOR  TIMINGS 
(All  timings  relate  to  1024  point  transforms) 


Test  Function  1 


Precision 

Compiler  Options 

M0NR05 

FORTRAN 

Single 

Tes 

3.7 

M0NR05 

FORTRAN 

Double 

Yes 

4.5 

M0NR05 

FORTRAN 

Single 

No 

5.9 

M0NR05 

PASCAL 

Oouble 

No 

13.6 

BRENNERmod 

FORTRAN 

Single 

Yes 

14.3 

BRENNERmod 

FORTRAN 

Single 

No 

16.4 

MCGW/Singleton 

PASCAL 

Oouble 

No 

36.5 

In  comparison  the  Brenner  modified  algorithm  performs  a  1024  point 
transform  on  a  Uni  vac  1106  with  an  FPS  pipeline  processor  in  0.006  seconds 
in  double  precision  FORTRAN. 

FOURT  was  not  run;  however  it  would  probably  be  faster  than  M0NR05  under 
similar  conditions.  However  the  differences  in  timing  would  be  measured  in 
nothing  more  significant  than  tenths  of  seconds. 

Other  mainframe  timings  of  the  MONRO  algorithms  are  given  in  [8]  and  [9], 
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5  ACCURACY 

So  far  we  have  not  discussed  accuracy  systematically  or  in  depth,  for  two 
reasons.  First,  the  other  algorithms  had  slower  speeds  when  compared  with 
the  MONRO  series  and  FOURT  and  were  not  further  tested  because  our  interest 
was  in  speed.  Second  inspection  of  the  output  indicates  that  the  accuracy 
attained  by  faster  algorithms  on  the  IBM  PC  was  as  high  as  the  manufacturer 
or  user  would  wish  when  dealing  with  data  contaminated  by  noise.  This 
conclusion  is  based  on  a  forward  and  backward  transformation  of  TF1, 
comparing  ar  with  the  numerical  inverse  of  its  algorithmical ly  obtained 
transform. 

The  first  three  series  of  the  Apple  II  experiments  were  carried  out  with 
the  MONRO  4  programs  in  Microsoft  Compiled  BASIC  only. 


Series  1 

In  this  series  the  procedures  were  as  follows: 


(i)  Calculate  the  input  values  for  TF1  of  ar  and  the  modulus  I  ar  I 

(which  is  theoretical ly  unity  in  this  case).  ' 

(ii)  Calculate  by  the  algorithm  the  transform 

(iii)  Calculate  by  the  algorithm  the  inverse  of  %,  say  ^r,  and  form 

|  ar  I  • 

(iv)  Calculate 


l  t'1 

mi  =-  l  (|ar|  -rar|) 

t  r=0 

2.1/2 

m2  =  (s2  -  mi ) 


where 


l  t*1 

s2  *  -  I  (  |a  I 
t  r=0  1  ' 


mi  and  can  be  interpreted,  respectively,  as  the  mean  and  standard 
deviation  of  the  difference  between  ar  and  'ar. 
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Series  2 


The  same  procedures  were  used  for  this  series  of  experiments  which  used  the 
Apple  II.  However,  mi  and  are  based  on  a^compari  son  of  R1AS>  the  real 
part  of  the  theoretical  inverse,  and  R1A  ,  the  real  part  of  the 
algorithmical ly  obtained  inverse.  Thus,  1 

mi  =  i  R1  l  (As  -  As), 

1  s 


2  1  /2 

and  m2  =  (s2  -  mj ) 


where  s2  =  ^  (R1AS  -  R1AS)2. 


Series  3 


In  this  series  max  {|as  -^s|) 


is  calculated. 


Series  4 

FOURT  in  Compiled  BASIC  was  run  using  TF1.  A  criterion  involving  real  and 
imaginary  parts  was  evaluated.  The  BASIC  realisation  of  FOURT  retained  the 
Fortran-type  storage  for  complex  data  in  an  array  B  of  length  2t.  Thus 
b2r-l  ^  Rlar,  b2r=Imar,  r*0,l,2,....(t-l).  With  obvious  notation, 
max  [  jbk  -  bk|  ]  is  calculated  for  U  k  s  2t. 

The  results  of  the  Series  4  of  Apple  II  experiments  are  given  in  Table  4. 


TABLE  4 

CALCULATIONS  OF  ACCURACY 


Series 

— 

1 

Series 

2 

Series  3 

Series  4 

mjx  10  10 

O 

1 

o 

rW 

"cm 

E 

ntjxlO  ^ 

m2xl0'6 

max  xlO'9 

o 

max  xlO 

32 

0.73 

13.5 

2.6 

7.4 

5.7 

5.9 

128 

2.15 

16.6 

-0.83 

5.9 

18.2 

11.4 

512 

1.35 

43.3 

-2.1 

4.3 

33.8 

36.2 

1024 

9.9 

29.4 

0.86 

3.6 

57.5 

59.7 
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Table  4  shows  a  very  satisfactory  performance 
microcomputer  on  the  basis  of  several  criteria. 
Series  3  and  Series  4  show  little  difference, 
m  decreases  as  t  increases. 


fcr  both  a'gorithms  arc 
The  directly  comparable 
hote  that  in  Senes  2, 


More  discussion  in  the  context  of  mainframes  is  available  in  [Hi,  fgl  art 
many  other  sources.  Our  work  seems  be  the  first  attempt  to  review 
accuracy  of  the  algorithms  in  the  context  of  the  microcomputer. 


6  REAL  INPUT  DATA 


The  final  part  of  this  report  deals  only  with  timings, 
values  to  the  nearest  second  for  the  Apple  II  and 
TF2  (Eg.  2)  and  TF3  (Eg.  3). 


T  a  b  1  p  ci  i  i  v  e  *■ 
test  t.jnr  t  i  ons 


TABLE  S 

APPLE  II  TIMINGS:  REAL  a 
_  r 

Timings  (s) 


TF2 

TF  3 

FFT  '10 

Program 

M0NR04 

M0NR07 

FOURT 

FFT/10 

M0NR04 

MONROE 

32 

1 

1 

2 

6 

1 

2 

- i 

4 

64 

3 

3 

3 

10 

3 

3 

a 

128 

7 

6 

6 

24 

8 

6 

26 

256 

17 

14 

14 

56 

17 

13 

68 

512 

37 

28 

31 

127 

36 

29 

128 

1024 

78 

61 

66 

279 

77 

60 

290 

The  following  notes  apply  to  the  Table  5  data: 

All  runs  were  made  in  Microsoft  Compiled  BASIC. 

M0NR04  is  the  full,  complex  algorithm;  both  M0NR07  and  8  are  faster 
versions  adapted  to  real  time-domain  data.  All  three  algorithms  are 
described  in  references  [8]  and  [9], 

FOURT  has  been  tested  only  with  TF3. 

The  algorithms  were  not  translated  into  PASCAL  because  the  relative 
performance  of  the  Apple  II  in  Compiled  BASIC  vis-a-vis  PASCAL  can  be 
seen  in  Table  2. 
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The  following  comments  apply  to  the  Apple  II  timings: 

M0NR04  performs  at  about  the  same  speed  with  real  input  as  when  the 
input  is  complex.  Specially  adapted  M0NR07  and  M0NR08  offer 
significant  improvements. 

Under  TF3,  FOURT  is  slightly  slower  than  M0NR07.  However  the  authors 
of  FOURT  claim  that  it  can  be  made  to  run  up  to  40  percent  faster  with 
real  data.  This  claim  should  be  tested. 

Like  M0NR04 ,  FFT/10  is  much  the  same  as  with  a  complex  input. 

On  the  IBM  PC,  the  special  MONRO  algorithm  for  real  data  in  its 
original  FORTRAN  and  for  a  t=1024  transform  requires  2.8  seconds  in 
single  precision  and  3.6  seconds  in  double;  in  each  case  special 
compiler  optimisation  options  are  used. 


CONCLUSIONS 


Although  unsystematic,  the  results  of  the  experiments  support  the  view  that 
the  modern  personal  computer  has  an  increasingly  important  role  to  play  in 
signal  processing  applications  in  the  field  and  in  the  laboratory.  When  a 
sophisticated  requirement  calls  for  thousands  of  Fourier  transforms  in 
seconds  (or  even  minutes)  the  mainframe  continues  to  hold  its  own.  The 
attractive  attributes  of  the  personal  computer  are  its  availability,  price, 
size,  and  portability  as  compared  with  older  systems. 

The  Apple  II  and  the  IBM  PC  microcomputers  performed  well  in  implementing 
efficient  algorithms  on  the  demanding  complex  Fast  Fourier  Transform  test 
function  (TF1).  Both  personal  computers  can  be  applied  to  a  wide  range  of 
practical  situations  including  the  use  of  the  FFT  as  an  approximation  to 
the  Fourier  integral. 

For  8  and  16  bit  computers  we  have  found  the  Apple  II  and  IBM  PC  to  be 
excellent  tools  in  a  wider  field  of  scientific  applications  that  include 
traditional  numerical  analysis  and  statistical  data  processing  and 
inference,  and  Monte  Carlo  simulation.  They  provide  scientists  with  a 
fast,  reliable  and  accurate  tool  at  a  reasonable  price.  This  comment 
applies  also  to  other  professional  personal  computers  in  the  field, 
although  only  the  Apple  II  and  IBM  PC  were  tested. 

Detailed  statistics  on  the  speed  of  various  processors  ranging  from  the 
fastest  mainframes  to  microcomputers  are  found  in  [13].  The  tests 
described  in  that  report  were  conducted  in  the  specialised  environment  of 
the  solution  of  dense  systems  of  linear  equations;  however  the  absolute 
comparisons  are  likely  to  be  valid  in  a  wide?  context. 
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